Abstract
Background: Traditional prognostic classification of myelodysplastic syndromes (MDS) is based on scoring systems such as IPSS, WPSS, and IPSS-R. The scoring system lacks several important pieces of information in the patient's condition, especially age, gender, and performance status(PS).Therefore, prognosis prediction by scoring system is not accurate enough because it does not reflect the actual patient situation such as age and PS. Now, we demonstrate the use of a more accurate and intelligent machine learning-based model, and it is desirable to construct a more comprehensive prognostic system suitable for actual real worlds data in clinical practice.
Methods: We analyzed clinical data set of 333 patients with MDS diagnosed at the University of Tokyo Hospital from April 1996 to March 2020 using machine learning. Now, we created two novel prediction models for MDS using machine learning.
1) A novel prognostic prediction system was constructed by machine learning of the patient information including age, gender, PS, white blood counts, neutrophil, hemoglobin level, platelet counts, blast counts in bone marrow and peripheral blood, chromosomal abnormalities and treatment. 70% of the patient information in the dataset was analyzed as training data, and the remaining 30% was evaluated as test data. The cases were randomly assigned to the training test datasets. First, we used the training dataset to build a prediction system to link clinical information with subsequent leukemogenesis and mortality. Next, we evaluated the performance including the AUC and accuracy of the system on the test data.
2) To assess the benefit of drugs administered for the treatment of MDS, the effectiveness of drugs must be compared among patients with a similar level of risk. To this end, we devised a leukemogenesis index, in which we classified MDS patients into two groups: low risk (index <0.5) and high risk (index>0.5). We compared the mortality of these leukemogenesis risk-matched subgroups between the therapeutic agent and non-therapeutic agent. The outcome of patients in the leukemogenesis risk-matched subgroups demonstrated the real-world effectiveness of the therapeutic agents.
Results: We first attempted to find the optimal algorithm for machine learning out of 15 candidate algorithms by using the PyCaret, a machine learning library. It was determined that CatBoost performed the best in terms of an AUC (Area Under the Curve) and accuracy. Using CatBoost algorithm, the prediction performances of our algorithm for mortality with MDS patients yielded an AUC score of 0.73 and an accuracy score of 0.75. The AUC of this novel system with machine learning was 0.75, which showed an improvement of 16% in performance compared to the AUC of 0.59 for the traditional IPSS system. The prediction performances for leukemogenesis yielded an AUC score of 0.73 and an accuracy score of 0.76.
To assess the effectiveness of administered therapeutic agents, it is necessary to examine the effects of agents among MDS patients in subgroups of uniform predicted risk at the time of presentation when drug treatment is considered. To this end, we devised an evaluation method for leukemogenesis risk-matched analysis using the risk index. We compared the mortality rates of these leukemogenesis risk-matched subgroups between the treatment groups of Aza (azacytidine) and the other-treatment groups in non-transplantable patients. In the low-risk group, the Kaplan-Meier estimates of mortality by 1 year were 85 % in the Aza treatment group and same as 85 % in the other treatment group. In the high-risk group, the Kaplan-Meier estimates of mortality by 1 year were 64 % in the Aza treatment group and 42 % in the other treatment group. Due to the 60 cases in high-risk group, the difference did not reach significance.
Conclusion: In summary, we have developed novel comprehensive prediction models of leukemogenesis and prognosis for MDS patients with high accuracy using machine learning. We also developed a novel system, "leukemogenesis risk-matched analysis," to infer the real-world effectiveness of Aza stratifying MDS patients based on leukemogenesis risk as assessed by machine learning according to the patient information at the time of initial diagnosis. In addition, the leukemogenesis risk index can be used to distinguish between high-risk and low-risk patients, which is useful in supporting the management of treatment.
Honda: Takeda Pharmaceutical: Other: Lecture fee; Otsuka Pharmaceutical: Other: Lecture fee; Chugai Pharmaceutical: Other: Lecture fee; Ono Pharmaceutical: Other: Lecture fee; Jansen Pharmaceutical: Other: Lecture fee; Nippon Shinyaku: Other: Lecture fee. Kurokawa: AbbVie GK: Research Funding, Speakers Bureau; Daiichi Sankyo Company.: Research Funding, Speakers Bureau; Nippon Shinyaku Co., Ltd.: Research Funding, Speakers Bureau; Kyowa Hakko Kirin Co., Ltd.: Research Funding, Speakers Bureau; Sumitomo Dainippon Pharma Co., Ltd.: Research Funding, Speakers Bureau; Chugai Pharmaceutical Company: Research Funding, Speakers Bureau; Takeda Pharmaceutical Company Limited.: Research Funding, Speakers Bureau; ONO PHARMACEUTICAL CO., LTD.: Research Funding, Speakers Bureau; Otsuka Pharmaceutical Co., Ltd.: Research Funding, Speakers Bureau; Eisai Co., Ltd.: Research Funding, Speakers Bureau; MSD K.K.: Research Funding, Speakers Bureau; Astellas Pharma Inc.: Research Funding, Speakers Bureau; Pfizer Japan Inc.: Research Funding, Speakers Bureau; Teijin Limited: Research Funding, Speakers Bureau.